Comparing Dissimilarity Measures for Symbolic Data Analysis
نویسندگان
چکیده
Nowadays, data analysts are confronted with new challenges: they are asked to process data that go beyond the classical framework, as in the case of data concerning more or less homogeneous classes or groups of individuals (second-order objects) instead of single individuals (first-order objects). A typical situation is that of census data, which raise privacy issues in all governmental agencies that distribute them. To guarantee that data analysts cannot identify an individual or a single business establishment, data are made available in aggregate form. Data aggregations by census tracts or by enumeration districts are examples of second-order objects.
منابع مشابه
Dissimilarity measures for histogram-valued data and divisive clustering of symbolic objects
Contemporary datasets are becoming increasingly larger and more complex, while techniques to analyse them are becoming more and more inadequate. Thus, new methods are needed to handle these new types of data. This study introduces methods to cluster histogram-valued data. However, histogram-valued data are difficult to handle computationally because observations typically have a different numbe...
متن کاملClustering Symbolic Time-Series using L-tuples
Among the many dimensionality reduction methods for timeseries data, Symbolic Aggregate approXimation (SAX) is perhaps the most popular due to its simplicity and uniqueness. With SAX, time-series data can be represented as string sequences which enables the utilization of methods found in text mining and bioinformatics to enhance data mining tasks. We propose an application of L-tuples to impro...
متن کاملA New Symbolic Dissimilarity Measure for Multivalued Data Type and Novel Dissimilarity Approximation Techniques
In this paper a new statistical measure for estimating the degree of dissimilarity between two symbolic objects whose features are multivalued symbolic data type is proposed. In addition two new simple representation techniques viz., interval type and magnitude type for the computed dissimilarity between the symbolic objects are introduced. The dissimilarity matrices obtained are not necessaril...
متن کاملAnalysis of Distribution Valued Dissimilarity Data
We deal with methods for analyzing complex structured data, especially, distribution valued data. Nowadays, there are many requests to analyze various types of data including spatial data, time series data, functional data and symbolic data. The idea of symbolic data analysis proposed by Diday covers a large range of data structures. We focus on distribution valued dissimilarity data and multid...
متن کاملComparing Dissimilarity Measures: A Case of Banking Ratios
The aim of this paper is twofold. Firstly, to discuss a clustering of a given set of the European banks into groups based on their performance during 1999–2013. Secondly, to compare different dissimilarity measures and to determine which of them suits best for clustering banking ratios. Six ratios that reveal profitability, efficiency, stability and loan portfolio quality of the banks were used...
متن کامل